output: html_document

This report was created the using the behavseqanalyser v0.1.1-alpha, behaviour were grouped using the MITsoft categorisation (date: 2018-01-31).

This is an analysis of the Ro_testdata project, done with the behavseqanalyser v0.1.1-alpha

The data is grouped by treatment. Data transformation: data (%age of time spent doing the behavior) transformed using the square root method..

##  first second 
##     11     11

We grouped the variables following the MITsoft argument to get 11 behavior categories. We used the folowing time windows and got 10 x 8 = 80 variables :

time_reference windowstart windowend windowname
Bin 0 120 first 2 hours of recording
Bintodark -120 0 last 2h before night
Bintodark 0 180 first 3h of night
Bintodark 540 720 late night (3h)
Bintodark 720 864 early day(3h)
Bintodark -120 864 full recording
lightcondition DAY NA daytime
lightcondition NIGHT NA nighttime

Note that the last window might be truncated if not all dataset is achieving 900 min after light on.

We then run a random forest to get the variables in order of importance to distinguish the groups. We then take the best 20 and run the random forest again (such that the Gini scores obtained will not depend on the initial number of variables). We plot here the table of variables ordered by weight:

Let’s take a teshold of importance (Gini > 0.95) and get all variables satisfying the filter, or at least 8 variables:

Groom4, Drink6, Drink8, Eat1, Eat6, Drink7, Drink2 and Eat4

Plotting

First, lets plot the 2 most discriminative variables following the random forest:

 Plot = Multi_datainput_m [,names(Multi_datainput_m) %in% as.character(R2 [1:2,1]) ]
  Plot = cbind(Multi_datainput_m$groupingvar, Plot)
  Title_plot = paste0(names (Plot) [2],"x",names (Plot) [3])
  names (Plot) = c("groupingvar","disciminant1", "discriminant2")
  p=ggplot (Plot, aes (y= disciminant1, x=discriminant2, color= groupingvar))+
    geom_point()+
    labs(title = Title_plot)+
    #scale_x_log10() + scale_y_log10()+
    scale_colour_grey() + theme_bw()+
      theme(legend.position='none')
print(p)  

Here, we plot the first two or threecomponents obtained after a ICAperformed on the reduced data:

PCA strategy

The PCA strategy shows that the behavior profile of the two groups of animal are not identical.

We performed a PCA on the data and tested whether the groups show a difference in their first component score using a Mann-Whitney or a Kruskal-Wallis rank sum test (if more than 2 groups exists). We plot here the first component in a boxplot:

NB: This strategy is pretty good against type I errors. On the other hand, it may well oversee existing differences.

SVM

We perform a SVM on the total data or the reduced data and compare the results. For that with split the data in training and test sets, tune the svm for best parameters and then run the svm and gives the overall accuracy (kappa) as the output. This accuracy (0.6666667) was tested for significance, using a permutation strategy. We performed 1 permutations. (What it does is permute the elements in random groups in the training data, tune a svm and apply it to the (non-randomised) test set, its prediction (kappa score) is saved. We use a Binomial confidence interval to calculate a p value. )

The SVM procedure could not tell the two groups apart.


Details: [1] “80 variables: Accuracy of the prediction with sigmoid kernel (Kappa index: 0 denotes chance level, maximum is 1):0.666666666666667”

distribution of the accuracy scores with permuted labels, with adding a vertical line at the Score obtained using the real groups.

P value calculation:

                                 # Exports `binconf`
k <- sum(abs(Acc_sampled) >= abs(Accuracyreal))   # Two-tailed test
R=binconf(k, length(Acc_sampled), method='exact')
print(zapsmall(R)) # 95% CI by default
##  PointEst Lower Upper
##         1 0.025     1
 save.image(file= "results.rdata")